introduce a new integrated "codeflash optimize" command by misrasaurabh1 · Pull Request #384 · codeflash-ai/codeflash

misrasaurabh1 · 2025-06-26T03:50:18Z

PR Type

Enhancement, Tests

Description

Introduce integrated optimize CLI command
Add FunctionRanker class for ttX-based ranking
Extend tracer: generate replay test and invoke optimizer
Update tests: unit and end-to-end optimize flows

Changes diagram

flowchart LR
  CLI["codeflash optimize command"] --> TR["Tracer.trace code"]
  TR --> RT["Generate replay test"]
  RT --> FR["FunctionRanker.rerank_and_filter"]
  FR --> OPT["Optimizer.run_with_args"]
  OPT --> OUT["Optimization results"]

Changes walkthrough 📝

Relevant files

Enhancement

7 files

workload.py `Add sleep and heavy compute in SimpleModel.predict`	+10/-1
function_ranker.py New `FunctionRanker` for function profiling ranking	+144/-0
cli.py Add `optimize` subcommand and CI flag parsing	+10/-1
env_utils.py Add `is_ci()` helper for CI environment detection	+6/-0
functions_to_optimize.py `Integrate trace path and rerank functions workflow`	+53/-3
tracer.py `Support replay tests, static methods, optimization chaining`	+51/-8
profile_stats.py Include `class_name` and normalize caller keys	+14/-2

Configuration changes

2 files

config_consts.py Define `DEFAULT_IMPORTANCE_THRESHOLD` constant	+1/-0
pyproject.toml `Configure pytest warning filters`	+6/-0

Formatting

3 files

server.py `Fix import order and comma formatting in LSP server`	+7/-6
server_entry.py Change `setup_logging` return type signature	+4/-3
posthog_cf.py `Clean up trailing comments and commas`	+2/-2

Miscellaneous

1 files

pickle_patcher.py `Remove debug print in placeholder creation`	+0/-2

Tests

3 files

end_to_end_test_tracer_replay.py `Update traced count and coverage expectations`	+2/-2
end_to_end_test_utilities.py Switch to `codeflash.main optimize` invocation	+10/-24
test_function_ranker.py Add unit tests for `FunctionRanker` methods	+172/-0

Additional files

1 files

codeflash.trace	[link]

Need help?
Type /help how to ... in the comments thread for any questions about PR-Agent usage.
Check out the documentation for more information.

github-actions · 2025-06-26T03:51:30Z

PR Reviewer Guide 🔍

(Review updated until commit `9addd95`)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 5 🔵🔵🔵🔵🔵
🧪 PR contains tests
🔒 Security concerns Sensitive information exposure: A live PostHog API key is committed in codeflash/telemetry/posthog_cf.py, which could be abused if extracted. Replace with secure retrieval (e.g., environment variable) and remove the embedded key.
⚡ Recommended focus areas for review Sensitive Info The PostHog project API key is hardcoded in source, risking exposure of credentials. _posthog = Posthog(project_api_key="phc_aUO790jHd7z1SXwsYCz8dRApxueplZlZWeDSpKc5hol", host="https://us.posthog.com") CLI Parsing Using parse_known_args with sys.argv reassignment may drop or reorder flags, leading to unexpected behavior for subcommands. args, unknown_args = parser.parse_known_args() sys.argv[:] = [sys.argv[0], unknown_args] Performance Bottleneck* The nested loops and sleep in SimpleModel.predict will significantly slow benchmarking and may skew optimization results. sleep(0.1) # can be optimized away for i in range(500): for x in data: computation = 0 computation += x * i ** 2 result.append(computation) return result

github-actions · 2025-06-26T03:52:38Z

PR Code Suggestions ✨

Latest suggestions up to 9addd95
Explore these optional code suggestions:

Category	Suggestion	Impact
Security	Load API key from environment Avoid committing sensitive API keys to source. Load the PostHog project_api_key from an environment variable at runtime and error out if it's missing. codeflash/telemetry/posthog_cf.py [24] -_posthog = Posthog(project_api_key="phc_aUO790jHd7z1SXwsYCz8dRApxueplZlZWeDSpKc5hol", host="https://us.posthog.com") +import os +project_api_key = os.getenv("POSTHOG_API_KEY") +if not project_api_key: + logger.error("POSTHOG_API_KEY environment variable is not set") + return +_posthog = Posthog(project_api_key=project_api_key, host="https://us.posthog.com") + Suggestion importance[1-10]: 9 __ Why: Hardcoding the PostHog key is a security risk; loading from `POSTHOG_API_KEY` ensures credentials aren’t committed and fails fast if not set.	High
Possible issue	Support both schema versions The code now unpacks an extra `class_name` column which may not exist in the pstats schema, leading to unpack errors. Handle both schema versions by inspecting row length and defaulting `class_name` when missing. codeflash/tracing/profile_stats.py [26-36] -for ( - filename, - line_number, - function, - class_name, - call_count_nonrecursive, - num_callers, - total_time_ns, - cumulative_time_ns, - callers, -) in pdata: +for row in pdata: + # support both old and new schemas + if len(row) == 8: + filename, line_number, function, call_count_nonrecursive, num_callers, total_time_ns, cumulative_time_ns, callers = row + class_name = None + else: + filename, line_number, function, class_name, call_count_nonrecursive, num_callers, total_time_ns, cumulative_time_ns, callers = row + loaded_callers = json.loads(callers) + ... Suggestion importance[1-10]: 6 __ Why: The new `class_name` unpack assumes an updated schema and may break older traces; checking row length and defaulting `class_name` maintains backward compatibility.	Low

Previous suggestions

Suggestions up to commit 4debe7e

Category	Suggestion	Impact
Possible issue	Include optimize subcommand in invocation Include the "optimize" subcommand in the `sys.argv` array so the top-level CLI parser recognizes it and invokes the optimization phase correctly. codeflash/tracer.py [873] -sys.argv = ["codeflash", "--replay-test", str(replay_test_path)] +sys.argv = ["codeflash", "optimize", "--replay-test", str(replay_test_path)] Suggestion importance[1-10]: 8 __ Why: The CLI invocation omits the "optimize" subcommand causing the top-level parser to misinterpret "--replay-test", so including it is critical for correct workflow.	Medium
General	Limit sys.argv mutation to optimize Restrict the mutation of `sys.argv` to only when the `"optimize"` command is active, so other subcommands receive their intended arguments and avoid unintended stripping. codeflash/cli_cmds/cli.py [73-74] args, unknown_args = parser.parse_known_args() -sys.argv[:] = [sys.argv[0], unknown_args] +if args.command == "optimize": + sys.argv[:] = [sys.argv[0], unknown_args] Suggestion importance[1-10]: 6 __ Why: Restricting the global `sys.argv` rewrite to the optimize command prevents other subcommands from losing their arguments and improves correctness.	Low
General	Hoist tracer_main import Move the import of `tracer_main` to the module level to avoid repeated imports each time `parse_args` is called and clarify dependencies. codeflash/cli_cmds/cli.py [26-27] -trace_optimize = subparsers.add_parser("optimize", help="Trace and optimize a Python project.") from codeflash.tracer import main as tracer_main +trace_optimize = subparsers.add_parser("optimize", help="Trace and optimize a Python project.") + Suggestion importance[1-10]: 3 __ Why: Moving the import to module scope reduces repeated imports in `parse_args`, but the performance gain is minimal and context clarity is modest.	Low

…(`trace-and-optimize`) Here is an optimized rewrite of your `FunctionRanker` class. **Key speed optimizations applied:** 1. **Avoid repeated loading of function stats:** The original code reloads function stats for each function during ranking (`get_function_ttx_score()` is called per function and loads/returns). We prefetch stats once in `rank_functions()` and reuse them for all lookups. 2. **Inline and batch lookups:** We use a helper to batch compute scores directly via a pre-fetched `stats` dict. This removes per-call overhead from attribute access and creation of possible keys inside the hot loop. 3. **Minimal string operations:** We precompute the two possible key formats needed for lookup (file:qualified and file:function) for all items only ONCE, instead of per invocation. 4. **Skip list-comprehension in favor of tuple-unpacking:** Use generator expressions for lower overhead when building output. 5. **Fast path with `dict.get()` lookup:** Avoid redundant `if key in dict` by just trying `dict.get(key)`. 6. **Do not change signatures or behavior. Do not rename any classes or functions. All logging, ordering, functionality is preserved.** **Summary of performance impact:** - The stats are loaded only once, not per function. - String concatenations for keys are only performed twice per function (and not redundantly in both `rank_functions` and `get_function_ttx_score`). - All lookup and sorting logic remains as in the original so results will match, but runtime (especially for large lists) will be significantly better. - If you want, you could further optimize by memoizing scores with LRU cache, but with this design, dictionary operations are already the bottleneck, and this is the lowest-overhead idiomatic Python approach. - No imports, function names, or signatures are changed. Let me know if you need further GPU-based or numpy/pandas-style speedups!

codeflash-ai · 2025-06-30T19:14:18Z

⚡️ Codeflash found optimizations for this PR

📄 13% (0.13x) speedup for `FunctionRanker.rank_functions` in `codeflash/benchmarking/function_ranker.py`

⏱️ Runtime : 1.84 milliseconds → 1.62 milliseconds (best of 67 runs)

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up method FunctionRanker.rank_functions by 13% in PR #384 (trace-and-optimize) #458

If you approve, it will be merged into this PR (branch trace-and-optimize).

…384 (`trace-and-optimize`) Here is an **optimized** version of your code, focusing on the `_get_function_stats` function—the proven performance bottleneck per your line profiing. ### Optimizations Applied 1. **Avoid Building Unneeded Lists**: - Creating `possible_keys` as a list incurs per-call overhead. - Instead, directly check both keys in sequence, avoiding the list entirely. 2. **Short-circuit Early Return**: - Check for the first key (`qualified_name`) and return immediately if found (no need to compute or check the second unless necessary). 3. **String Formatting Optimization**: - Use f-strings directly in the condition rather than storing/interpolating beforehand. 4. **Comment Retention**: - All existing and relevant comments are preserved, though your original snippet has no in-method comments. --- --- ### Rationale - **No lists** or unneeded temporary objects are constructed. - Uses `.get`, which is faster than `in` + lookup. - Returns immediately upon match. --- **This change will reduce total runtime and memory usage significantly in codebases with many calls to `_get_function_stats`.** Function signatures and return values are unchanged.

codeflash-ai · 2025-07-01T22:08:52Z

⚡️ Codeflash found optimizations for this PR

📄 51% (0.51x) speedup for `FunctionRanker._get_function_stats` in `codeflash/benchmarking/function_ranker.py`

⏱️ Runtime : 497 microseconds → 330 microseconds (best of 51 runs)

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up method FunctionRanker._get_function_stats by 51% in PR #384 (trace-and-optimize) #466

If you approve, it will be merged into this PR (branch trace-and-optimize).

…25-07-01T22.08.43 ⚡️ Speed up method `FunctionRanker._get_function_stats` by 51% in PR #384 (`trace-and-optimize`)

codeflash-ai · 2025-07-01T22:10:31Z

This PR is now faster! 🚀 @misrasaurabh1 accepted my optimizations from:

⚡️ Speed up method FunctionRanker._get_function_stats by 51% in PR #384 (trace-and-optimize) #466

github-actions · 2025-07-02T02:52:57Z

Persistent review updated to latest commit 9addd95

…imize`) Here's an optimized version of your Python program, focused on runtime and memory. **Key changes:** - Avoids reading the event file or parsing JSON if not needed. - Reads the file as binary and parses with `json.loads()` for slightly faster IO. - References the `"draft"` property directly using `.get()` to avoid possible `KeyError`. - Reduces scope of data loaded from JSON for less memory usage. - Caches the result of parsing the event file for repeated calls within the same process. - The inner try/except is kept close to only catching the specific case. - Results for each event_path file are cached in memory. - Exception handling and comments are preserved where their context is changed. - I/O and JSON parsing is only done if both env vars are set and PR number exists.

codeflash-ai · 2025-07-03T05:59:00Z

⚡️ Codeflash found optimizations for this PR

📄 121% (1.21x) speedup for `is_pr_draft` in `codeflash/code_utils/env_utils.py`

⏱️ Runtime : 4.98 milliseconds → 2.25 milliseconds (best of 94 runs)

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up function is_pr_draft by 121% in PR #384 (trace-and-optimize) #499

If you approve, it will be merged into this PR (branch trace-and-optimize).

… trace-and-optimize

KRRT7 · 2025-07-03T19:12:55Z

@misrasaurabh1 ready to review, can't tag you normally b/c you're the author

aseembits93 · 2025-07-03T22:55:26Z

+[tool.pytest.ini_options]
+filterwarnings = [
+    "ignore::pytest.PytestCollectionWarning",
+    "ignore::pytest.PytestUnknownMarkWarning"


reasoning for PytestUnknownMarkWarning @KRRT7 ?

in codeflash/models/models.py we have a few Classes that are prefixed with Test, like

@dataclass(frozen=True) class TestsInFile: test_file: Path test_class: Optional[str] test_function: str test_type: TestType

which pytest complains about :

it is very noisy so I've disabled them

any examples of PytestUnknownMarkWarning specifically you've seen so far. I've not seen them yet.

it gets triggered on the pytests that have the skip_ci markers

This reverts commit 39e0859.

introduce a new integrated "codeflash optimize" command

4debe7e

misrasaurabh1 marked this pull request as draft June 26, 2025 03:50

github-actions Bot added the Review effort 4/5 label Jun 26, 2025

KRRT7 and others added 3 commits June 26, 2025 18:33

Merge branch 'main' into trace-and-optimize

535a9b1

Merge branch 'main' into trace-and-optimize

09bf156

rank functions

0b4fcb6

KRRT7 force-pushed the trace-and-optimize branch from 08464c4 to 0b4fcb6 Compare June 30, 2025 19:04

Merge branch 'main' into trace-and-optimize

059b4dc

codeflash-ai Bot mentioned this pull request Jun 30, 2025

⚡️ Speed up method FunctionRanker.rank_functions by 13% in PR #384 (trace-and-optimize) #458

Closed

KRRT7 and others added 5 commits June 30, 2025 16:06

implement reranker

7f9a609

allow predict to be included

eb9e0c6

fix tracer for static methods

ce68cad

Merge branch 'main' into trace-and-optimize

b7258a9

codeflash-ai Bot mentioned this pull request Jul 1, 2025

⚡️ Speed up method FunctionRanker._get_function_stats by 51% in PR #384 (trace-and-optimize) #466

Merged

Merge pull request #466 from codeflash-ai/codeflash/optimize-pr384-20…

67bd717

…25-07-01T22.08.43 ⚡️ Speed up method `FunctionRanker._get_function_stats` by 51% in PR #384 (`trace-and-optimize`)

KRRT7 and others added 8 commits July 1, 2025 16:53

update tests

ea16342

don't let the AI replicate

947ab07

Merge branch 'main' into trace-and-optimize

4823ee5

ruff

faebe9b

mypy-ruff

a0e57ba

silence test collection warnings

fd1e492

Update function_ranker.py

f7c8a6b

Update workload.py

35059a9

github-actions Bot added Review effort 5/5 and removed Review effort 4/5 labels Jul 2, 2025

KRRT7 added 4 commits July 2, 2025 18:44

rank only, change formula

70cecaf

per module ranking

96acfc7

update tests

e5e1ff0

move to env utils, pre-commit

eba8cb8

codeflash-ai Bot mentioned this pull request Jul 3, 2025

⚡️ Speed up function is_pr_draft by 121% in PR #384 (trace-and-optimize) #499

Closed

Merge branch 'main' of https://github.com/codeflash-ai/codeflash into…

9955081

… trace-and-optimize

misrasaurabh1 commented Jul 3, 2025

View reviewed changes

Comment thread code_to_optimize/code_directories/simple_tracer_e2e/codeflash.trace

Merge branch 'main' into trace-and-optimize

692f46e

misrasaurabh1 requested a review from aseembits93 July 3, 2025 22:48

misrasaurabh1 enabled auto-merge July 3, 2025 22:51

aseembits93 reviewed Jul 3, 2025

View reviewed changes

Comment thread codeflash/cli_cmds/cli.py

aseembits93 reviewed Jul 3, 2025

View reviewed changes

Comment thread codeflash/cli_cmds/cli.py

KRRT7 and others added 5 commits July 3, 2025 16:20

add markers

e2e6803

Merge branch 'main' into trace-and-optimize

4560b8b

Update cli.py

39e0859

Revert "Update cli.py"

c09f32e

This reverts commit 39e0859.

allow args for the optimize command too

60922b8

misrasaurabh1 commented Jul 4, 2025

View reviewed changes

Comment thread codeflash/cli_cmds/cli.py

misrasaurabh1 commented Jul 4, 2025

View reviewed changes

Comment thread codeflash/cli_cmds/cli.py

KRRT7 added 2 commits July 3, 2025 17:20

fix parsing

bf6313f

fix parsing

87f44a2

misrasaurabh1 disabled auto-merge July 4, 2025 03:01

misrasaurabh1 merged commit 75810a3 into main Jul 4, 2025
16 of 17 checks passed

Conversation

misrasaurabh1 commented Jun 26, 2025 • edited by aseembits93 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Changes diagram

Changes walkthrough 📝

Uh oh!

github-actions Bot commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 9addd95)

Uh oh!

github-actions Bot commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

codeflash-ai Bot commented Jun 30, 2025

⚡️ Codeflash found optimizations for this PR

📄 13% (0.13x) speedup for FunctionRanker.rank_functions in codeflash/benchmarking/function_ranker.py

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up method FunctionRanker.rank_functions by 13% in PR #384 (trace-and-optimize) #458

Uh oh!

codeflash-ai Bot commented Jul 1, 2025

⚡️ Codeflash found optimizations for this PR

📄 51% (0.51x) speedup for FunctionRanker._get_function_stats in codeflash/benchmarking/function_ranker.py

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up method FunctionRanker._get_function_stats by 51% in PR #384 (trace-and-optimize) #466

Uh oh!

codeflash-ai Bot commented Jul 1, 2025

Uh oh!

github-actions Bot commented Jul 2, 2025

Uh oh!

codeflash-ai Bot commented Jul 3, 2025

⚡️ Codeflash found optimizations for this PR

📄 121% (1.21x) speedup for is_pr_draft in codeflash/code_utils/env_utils.py

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up function is_pr_draft by 121% in PR #384 (trace-and-optimize) #499

Uh oh!

KRRT7 commented Jul 3, 2025

Uh oh!

Uh oh!

aseembits93 Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

KRRT7 Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aseembits93 Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

KRRT7 Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

misrasaurabh1 commented Jun 26, 2025 •

edited by aseembits93

Loading

github-actions Bot commented Jun 26, 2025 •

edited

Loading

(Review updated until commit `9addd95`)

github-actions Bot commented Jun 26, 2025 •

edited

Loading

📄 13% (0.13x) speedup for `FunctionRanker.rank_functions` in `codeflash/benchmarking/function_ranker.py`

⚡️ Speed up method `FunctionRanker.rank_functions` by 13% in PR #384 (`trace-and-optimize`) #458

📄 51% (0.51x) speedup for `FunctionRanker._get_function_stats` in `codeflash/benchmarking/function_ranker.py`

⚡️ Speed up method `FunctionRanker._get_function_stats` by 51% in PR #384 (`trace-and-optimize`) #466

📄 121% (1.21x) speedup for `is_pr_draft` in `codeflash/code_utils/env_utils.py`

⚡️ Speed up function `is_pr_draft` by 121% in PR #384 (`trace-and-optimize`) #499

KRRT7 Jul 3, 2025 •

edited

Loading